Method for dependency checking using a scoreboard for a pair of register sets having different precisions

ABSTRACT

A dependency checking method includes a scoreboard which records destination operands of instructions outstanding within the pipeline of a microprocessor. Each single precision register maps to an indication within the scoreboard. Each double precision register which does not overlap with single precision registers maps to an indication within the scoreboard. Double precision registers which overlap single precision registers map to the set of indications corresponding to the overlapping single precision registers. Dependency checking for a source operand is performed by forming a first set of indications corresponding to the double precision registers and a second set of indications corresponding to the single precision registers, then selecting a dependency indication from these sets of indications in response to the source precision and the source register address. By forming the first and second sets of indications, the source register address can be used directly to select the dependency indication from each of the first and second sets of indications.

BACKGROUND OF THE INVENTION

1. Field of the Invention

This invention relates to the field of microprocessors and, moreparticularly, to dependency checking mechanisms within microprocessors.

2. Description of the Related Art

Superscalar microprocessors achieve high performance by executingmultiple instructions per clock cycle and by choosing the shortestpossible clock cycle consistent with the design. Superpipelinedmicroprocessor designs, on the other hand, divide instruction executioninto a large number of subtasks which can be performed quickly, andassign pipeline stages to each subtask. An extremely short clock cycleis the goal of superpipelined designs. By overlapping the execution ofmany instructions within the pipeline, superpipelined microprocessorsattempt to achieve high performance. Many microprocessor designs employa combination of superscalar and superpipeline techniques to achieveperformance goals.

As used herein, the term "clock cycle" refers to an interval of timeaccorded to various stages of an instruction processing pipeline withinthe microprocessor. Storage devices (e.g. registers and arrays) capturetheir values according to the clock cycle. For example, a storage devicemay capture a value according to a rising or falling edge of a clocksignal defining the clock cycle. The storage device then stores thevalue until the subsequent rising or falling edge of the clock signal,respectively. Generally, a pipeline comprises a plurality of pipelinestages. Each pipeline stage is configured to perform an operationassigned to that stage upon a value while other pipeline stagesindependently operate upon other values. When a value exits thepipeline, the function employed as the sum of the operations of eachpipeline stage is complete. For example, an "instruction processingpipeline" is a pipeline employed to process instructions in a pipelinedfashion. Although the pipeline may be divided into any number of stagesat which portions of instruction processing are performed, instructionprocessing generally comprises fetching the instruction, decoding theinstruction, executing the instruction, and storing the executionresults in the destination identified by the instruction.

A problem faced in both superscalar and superpipelined designs isdependency checking. Generally, a first instruction which is subsequentto a second instruction in program order has a dependency on the secondinstruction if a source operand of the first instruction is (at least inpart) the destination operand of the second instruction. The secondinstruction provides a value used by the first instruction, andtherefore the second instruction must be executed prior to the firstinstruction. Actions taken upon detection of dependency vary dependingupon the design, but dependencies generally must be detected.

Dependency checking is difficult in both superscalar and superpipelineddesigns due to the number of instructions which may be outstandingwithin the pipeline (e.g. subsequent to dispatch and prior to forwardingof the data in response to executing the instruction). In superscalardesigns, many execution units may be employed, each of which may beprocessing one or more instructions. In superpipelined designs, numerouspipeline stages may be operating upon different instructionsconcurrently. As mentioned above, many microprocessor designs employboth superscalar and superpipelining techniques, further increasing thenumber of instructions which may be outstanding. Checking dependenciesamong these numerous instructions may involve substantial logic involvedin comparing the source operands of an instruction to the destinationoperands of the outstanding instructions.

Generally, a source operand is a value operated upon by a microprocessorin response to an instruction to produce a result. The result is storedaccording to a destination operand specified by the instruction.Depending upon the microprocessor architecture employed by a particularmicroprocessor, operands may be memory operands (i.e. operands stored ina memory location, a copy of which may be stored in an optional cacheemployed by the microprocessor) or register operands (i.e. operandsstored in a register within a set of registers architecturally definedas part of the microprocessor). Many architectures, notably reducedinstruction set complexity (RISC) architectures such as the ScalableProcessor Architecture (SPARC™), specify load and store instructions fortransferring operands from memory to registers. Other instructionsspecify register operands as source and destination operands.

The SPARC™ specifies a floating point register set which furthercomplicates dependency checking. Generally, floating point operands areconsidered to have a precision, which refers to the number of bits inthe significand and the size of the exponent. For example, the Institutefor Electrical and Electronic Engineers (IEEE) have defined IEEEstandard 754 in which a single precision floating point number comprises8 bits of exponent and 23 bits of significand (not including the impliedbit). Alternatively, a double precision floating point number includes11 bits of exponent and 52 bits of significand. Additional precisionsmay be defined as desired.

The SPARC™ floating point register set includes a first set of registersfor storing floating point double precision operands and a second set ofregisters for storing floating point single precision operands.Additionally, the storage allocated to the second set of registersoverlaps with half of the double precision storage. Therefore, adependency may exist between a double precision source operand and asingle precision destination operand (or vice versa). Furthermore, theregister addresses of the single precision registers and the doubleprecision registers which overlap are generally not equal. Therefore,more complex circuitry than a simple compare of register addresses isused to determine if a dependency exists. Generally, a register addressis a value which selects a particular register from a register set.

SUMMARY OF THE INVENTION

The problems outlined above are in large part solved by a dependencychecking method in accordance with the present invention. The methodincludes maintaining a scoreboard which records destination operands ofinstructions outstanding within the pipeline of the microprocessor. Eachsingle precision register maps to an indication within the scoreboard.Each double precision register which does not overlap with singleprecision registers maps to an indication within the scoreboard. Doubleprecision registers which overlap single precision registers map to theset of indications corresponding to the overlapping single precisionregisters. Dependency checking for a source operand is performed byforming a first set of indications corresponding to the double precisionregisters and a second set of indications corresponding to the singleprecision registers, then selecting a dependency indication from thesesets of indications in response to the source precision and the sourceregister address. Advantageously, dependency checking may be performedrapidly despite the potential dependencies between registers ofdiffering precisions and despite a large number of pipeline stageswithin the microprocessor. The rapid dependency checking may provide forhigher frequency operation of the microprocessor, thereby leading toincreased performance of the microprocessor.

By forming the first and second sets of indications, the source registeraddress can be used directly to select the dependency indication fromeach of the first and second sets of indications. Double precisionregister addresses for registers which overlap single precisionregisters need not be decoded to select a corresponding dependencyindication from the scoreboard. Dependency checking logic may thereby besimplified.

Broadly speaking, the present invention contemplates a method fordependency checking a source operand in a microprocessor. A plurality ofindications corresponding to a first plurality of registers having afirst precision and to a second plurality of registers having a secondprecision is stored in a scoreboard. The plurality of indicationsidentify which of the first plurality of registers and the secondplurality of registers are updated by instructions within a pipeline ofthe microprocessor. A first subset of the plurality of indicationscorresponding to the first plurality of registers is formed.Additionally, a second subset of the plurality of indicationscorresponding to the second plurality of registers is formed. Adependency indication is selected from the first subset of the pluralityof indications if a source precision corresponding to the source operandis the first precision. On the other hand, the dependency indication isselected from the second subset of the plurality of indications if thesource precision is the second precision.

The present invention further contemplates a method for dependencychecking a source operand in a microprocessor. The microprocessor has aregister file representing a first set of registers having a firstprecision and a second set of registers having a second precision. Afirst storage location within the register file is shared by one of thefirst set of registers and at least two of the second set of registers.A scoreboard is maintained indicating which of a plurality of storagelocations which comprise the register file are updated by instructionswithin a pipeline of the microprocessor. The scoreboard indicates whichportions of the first storage location are updated. A first set ofindications are formed from the scoreboard. The first set of indicationscorresponds to the first set of registers. A second set of indicationsare also formed from the scoreboard. The second set of indicationscorresponds to the second set of registers. A dependency indication isselected from the first and second sets of indications in response to asource register address and a source precision corresponding to thesource operand.

BRIEF DESCRIPTION OF THE DRAWINGS

Other objects and advantages of the invention will become apparent uponreading the following detailed description and upon reference to theaccompanying drawings in which:

FIG. 1 is a block diagram of one embodiment of a microprocessor.

FIG. 2 is a block diagram of one embodiment of a floating point registerfile within the microprocessor shown in FIG. 1.

FIG. 3 is a pipeline diagram of one embodiment of a floating pointpipeline employed by the microprocessor shown in FIG. 1.

FIG. 4 is a block diagram of one embodiment of a dispatch unit shown inFIG. 1.

FIG. 5 is a diagram of a scoreboard entry and the interpretation of theentry for both single precision and double precision numbers.

FIG. 6 is a diagram illustrating dependency checking circuitry for asource operand according to one embodiment of a dependency checking unitshown in FIG. 5.

FIG. 7 is an example of scoreboard update and use of the scoreboard fordependency checking.

While the invention is susceptible to various modifications andalternative forms, specific embodiments thereof are shown by way ofexample in the drawings and will herein be described in detail. Itshould be understood, however, that the drawings and detaileddescription thereto are not intended to limit the invention to theparticular form disclosed, but on the contrary, the intention is tocover all modifications, equivalents and alternatives falling within thespirit and scope of the present invention as defined by the appendedclaims.

DETAILED DESCRIPTION OF THE INVENTION

Turning now to FIG. 1, a block diagram of one embodiment of amicroprocessor 10 is shown. As shown in FIG. 1, microprocessor 10includes an instruction cache 12, a fetch unit 14, a dispatch unit 16, aworking register file 18, a plurality of integer units 20A and 20B, anarchitected register file 22, a data cache 24, and a floating point unit26. Floating point unit 26 includes a floating point (FP) register file28, an FP add unit 30, an FP multiply unit 32, and an FP divide unit 34.Instruction cache 12 is coupled to fetch unit 14, which is furthercoupled to dispatch unit 16. Dispatch unit 16 is in turn coupled tofloating point unit 26 and working register file 18. Working registerfile 18 is coupled to integer units 20A and 20B, which are furthercoupled to architectural register file 22 and data cache 24. Data cache24 is additionally coupled to FP register file 28. FP register file 28is coupled to FP add unit 30 and FP multiply unit 32.

Generally speaking, dispatch unit 16 is configured to track floatingpoint registers having an outstanding update (i.e. registers which aredestination operands of instructions within the pipelines of thefloating point execution units 30, 32 and 34 or integer units 20A and20B, in the case of a load/store instruction) using a scoreboard. Thescoreboard includes indications for each storage location within FPregister file 28. Storage locations which are shared between a pair ofsingle precision floating point registers and a double precisionfloating point register are represented by indications corresponding toeach of the single precision registers. The indication corresponding tothe double precision floating point register is formed by logicallycombining the indications corresponding to the single precision floatingpoint registers. For example, each indication may be a bit indicative,when set, that an update to the corresponding register is outstanding.When clear, the bit indicates that no update is outstanding. The logicalcombination is an ORing function for this example.

The dispatch unit forms a set of double precision indications for thedouble precision registers by selecting the indications from thescoreboard which correspond to the double precision registers which arenot overlapped by the single precision registers, and by performing thelogical combination of the corresponding single precision indications togenerate indications corresponding to the double precision registerswhich are overlapped. A set of single precision indications is formed byselecting the indications from the scoreboard which correspond to thesingle precision registers. Dependency checking may then be performedrapidly by selecting a dependency indication based upon the registeraddress and the precision of a source operand being checked. Theregister address selects one of the set of double precision indicationsand one of the set of single precision indications, and the precisiondetermines which of the sets of indications is to be selected from.Advantageously, a rapid dependency check may be performed even thoughthe number of pipeline stages in which instructions may reside may bequite large. Furthermore, dependencies between different precisionoperands may be quickly resolved by the dependency checking logicemployed by dispatch unit 16. Since the indications for the overlappingdouble precision registers and single precision registers are combined,the number of indications stored in the scoreboard may be less than thetotal number of addressable storage locations. For example, 32 singleprecision registers which overlap 16 of 32 double precision registersmay be tracked by a scoreboard having 48 indications, even though thereare 64 total addressable locations.

Furthermore, the scoreboard approach is scaleable to larger numbers offloating point pipelines. The scoreboard tracks outstanding registerupdates regardless of the number of pipelines and the number of pipelinestages within the pipelines.

Instruction cache 12 is a high speed cache memory for storinginstructions. Instruction cache 12 may be structured in any suitablemanner, including set associative or direct mapped structures.

Fetch unit 14 is configured to fetch instructions from instruction cache12 and to provide the fetched instructions to dispatch unit 16. Fetchunit 14 may include branch prediction hardware in order to predictbranch instructions taken or not taken. Instructions may be fetched fromthe predicted address and provided to dispatch unit 16. If a branchmisprediction is detected, the fetched instructions are discarded andthe correct instructions fetched.

Fetch unit 14 also performs predecoding upon the fetched instructions.The information generated by fetch unit 14 is used to aid dispatch unit16 in the dispatching of instructions. For example, fetch unit 14 mayidentify each instruction as either a floating point instruction (fordispatch to floating point unit 26), or an integer instruction (fordispatch to integer units 20A and 20B). Additionally, fetch unit 14 mayidentify the precision selected by the floating point instructions.According to one embodiment, each instruction encoding indicates whichprecision is selected via the opcode portion of the instruction.Additional predecoding may be implemented in various embodiments aswell.

Dispatch unit 16 receives instructions from fetch unit 14 and dispatchesthe instructions to integer units 20A-20B or floating point unit 26.Generally, dispatch unit 16 applies a set of dispatch rules to theinstructions eligible for dispatch, and dispatches as many instructionsas possible during each clock cycle according to the dispatch rules. Inone embodiment, the dispatch rules include inhibiting dispatch of aninstruction if one or more of the source operands for the instruction isdependent upon another instruction within the instruction pipelines offloating point unit 26 or integer units 20A-20B. Additionally,instructions are dispatched in program order. Other dispatch rules maybe implemented according to design choice in various embodiments. Inparticular, instructions may be dispatched out of program order in otherembodiments.

Working register file 18 is used to store operands for reading byinstructions being dispatched to integer units 20A-20B. Integerinstructions are selected for dispatch by dispatch unit 16 and conveyedto working register file 18, from which the operands are read. Theoperands and the instruction are subsequently conveyed to the integerunit 20A-20B selected by dispatch unit 18 to execute the instruction.Integer units 20A-20B employ pipelines having one or more stages forexecuting the instructions, after which the results are written toarchitectural register file 22 and working register file 18. Workingregister file 18 and architectural register file 22 are both used forstoring integer operands. Architectural register file 22 includesstorage for each architected register, while working register file 18 isused to store register values currently in use by integer units 20A-20B.Since working register file 18 is smaller than architectural registerfile 22, working register file 18 may be accessed more quickly thanarchitectural register file 22. Additionally, working register file 18may be updated with speculative results and may be recovered fromarchitectural register file 22.

Integer units 20A and 20B may be symmetrical or asymmetrical executionunits. Symmetrical execution units are configured similarly, andtherefore can execute the same subset of the instructions set employedby microprocessor 10 (e.g. the integer instructions). Asymmetricalexecution units employ dissimilar hardware. In this case, thecombination of integer units 20A-20B include enough hardware to executeeach of the integer instructions. Additionally, a dispatch rule employedby dispatch unit 16 is created in the case of asymmetrical units toensure that each instruction is dispatched to a unit configured toexecute that instruction.

Integer units 20A-20B are also configured to execute load and storememory operations in order to fetch memory operands. For example, theSPARC™ architecture defines load/store instructions. Load instructionsfetch memory operands from memory and place the operands into registersfor access by other instructions. Store instructions fetch registeroperands and stored them into memory as specified by a memory operand.Integer units 20A-20B access data cache 24 in order to perform memoryoperations. Data cache 24 is a high speed cache memory for storing data(i.e. memory operands upon which microprocessor 10 operates in responseto a program being executed). Data cache 24 may employ any structure,such as a set-associative or direct-mapped structure. Data cache 24routes memory operands read in response to load memory operations toeither (i) FP register file 28 or (ii) architectural register file 22and working register file 18, depending upon whether the destinationoperand of the load memory operation is an integer register or afloating point register.

Dispatch unit 16 dispatches floating point instructions to floatingpoint unit (FPU) 26. The floating point instructions read operands fromFP register file 28. The instructions and corresponding operands arethen routed to either FP add unit 30 or FP multiply unit 32. Floatingpoint add/subtract type instructions are executed by FP add unit 30,while floating point multiply/divide type instructions begin executionin FP multiply unit 32. Multiply operations complete within FP multiplyunit 32, while floating divide and square root computations are routedto FP divide unit 34 from the first stage of the pipeline within FPmultiply unit 32. Floating point divide and square root functions usemore pipeline stages that multiply functions in the embodiment of FPU 26shown in FIG. 1. Furthermore, the number of stages may vary dependingupon the operands for the divide and square root instructions. Hence,the divide and square root operations are executed in the dividepipeline.

Turning now to FIG. 2, a block diagram of the arrangement of oneembodiment of FP register file 28 is shown. The embodiment of FPregister file 28 shown in FIG. 2 is compatible with the SPARC™definition, according to one embodiment. As shown in FIG. 2, FP registerfile 28 includes a first set of storage locations 40A-40N and a secondset of storage locations 42A-42N. In one embodiment, FP register file 28includes 16 storage locations 40A-40N and 16 storage locations 42A-42N.It is noted that any number of storage locations 40A-40N and of storagelocations 42A-42N according to the number of registers defined in thefloating point register sets. Furthermore, the number of storagelocations 40A-40N may differ from the number of storage locations42A-42N. Storage locations 42A-42N are divided into an upper portion(e.g. reference numerals 44A-44N) and a lower portion (e.g. referencenumerals 46A-46N).

Each of storage locations 40A-40N are addressable as a double precisionfloating point register. On the other hand, storage locations 42A-42Nare addressable as either a double precision floating point register ora pair of single precision floating point registers. Each storagelocation 42A-42N is mapped to a double precision register address and apair of single precision register addresses. For example, if a doubleprecision register address mapped to storage location 42A is presentedto FP register file 28 for reading, the value stored in both upperportion 44A and lower portion 46A is returned. Similarly, an updatevalue provided during a write with a double precision register addressmapped to storage location 42A is stored into storage location 42A (i.e.the upper portion of the update value in storage location 44A and thelower portion of the update value in storage location 46A). A singleprecision register address which maps to storage location 42A, on theother hand, is further used to select which of upper portion 44A orlower portion 46A is being addressed. Therefore, FP register file 28receives both the register address and the precision corresponding to anoperand in order to select the correct storage location 40A-40N or42A-42N. Similarly, dependency checking between source operands anddestination operands of instructions which are outstanding within FPU 26involves both the precision and the register address.

Turning next to FIG. 3, a pipeline diagram illustrating an exemplaryfloating point pipeline 50 which may be employed by one embodiment ofmicroprocessor 10 is shown. As shown in FIG. 3, pipeline 50 includes afetch/predecode stage 52, a dispatch stage 54, and a plurality ofexecution pipelines 56A-56E (e.g. an LD0 pipeline 56A, LD1 pipeline 56B,FA pipeline 56C, FM pipeline 56D, and FD pipeline 56E). LD0 pipeline 56Aincludes an operand read stage 58A, an execute stage 60A, a cache stage62A, and a writeback stage 64A. Similarly, LD1 pipeline 56B includes anoperand read stage 58B, an execute stage 60B, a cache stage 62B, and awriteback stage 64B. Similar stages to operand read stages 58A-58B arelabeled within FA pipeline 56C and FM pipeline 56D with referencenumerals 58C-58D. Similar stages to writeback stages 64A-64B are labeledwithin FA, FM, and FD pipelines 56C-56E with reference numerals 64C-64E.Additionally, FA pipeline 56C and FM pipeline 56D respectively includefirst execute stages 66C and 66D, second execute stages 68C and 68D, andthird execute stages 70C and 70D. FD pipeline 56E includes first executestage 66E, second execute stage 68D, etc., through N^(th) execute stage70E. LD0 pipeline 56A and LD1 pipeline 56B are implemented withininteger units 20A and 20B. FA pipeline 56C is implemented within FP addunit 30. Similarly, FM pipeline 56D is implemented within FP multiplyunit 32 and FD pipeline 56E is implemented within FP divide unit 34.

LD0 and LD1 pipelines 56A-56B are used to perform floating pointload/store instructions. LD0 pipeline 56A will be described, and LD1pipeline 56B is similar. During operand read stage 58A, integer registeroperands are read from working register file 18. During execute stage60A, the operands are added to form an address corresponding to thememory operand being read. If address translation is enabled, thisaddress is a virtual address and is presented to a translation lookasidebuffer (TLB) for conversion to a physical address. During cache stage63A, the address indexes into data cache 24 and selects the requesteddata (if the address is a hit, determined by comparing the translatedaddress presented by the TLB to the tag corresponding to the selectedcache line). During writeback stage 64A, the requested data is storedinto the destination floating point register. For dependency checkingpurposes, a dependency upon the destination operand of a floating pointload instruction exists for a subsequent floating point instructionwhich uses the destination operand as a source operand until thefloating point load instruction exits either LD0 pipeline 56A or LD1pipeline 56B (whichever pipeline it was dispatched to). According to oneembodiment, microprocessor 10 may forward data from cache stages 62A-62Bif a dependency is detected, in which case the dependency exists untilthe floating point load instruction exits the cache stage 62A or 62B.Generally, a dependency exists until the data corresponding to thedestination operand of the load/store instruction is available as asource operand of the dependent instruction.

FA pipeline 56C and FM pipeline 56D include operand read stages andwriteback stages similar to the description above, except that FPregister file 28 is accessed in operand read stages 58C and 58D.Additionally, execution of the floating point instructions withinpipelines 56C and 56D is divided into three stages during which thespecified arithmetic operation is carried out. Similarly, FD pipeline56E includes a variable number of execution stages including stages 66E,68E, and 70E over which execution of the divide and square rootoperations is accomplished. For dependency checking purposes, adependency upon the destination operand of a floating point instructionwithin pipelines 56C-56E exists for a subsequent floating pointinstruction which uses the destination operand until the floating pointinstruction exits the corresponding pipeline 56C-56E. According to oneembodiment, microprocessor 10 may forward data from execute stages70C-70E if a dependency is detected, in which case the dependency existsuntil the floating point instruction exits the corresponding executestage 70C-70E. Similar to the load/store pipelines, a dependencygenerally exists until the data corresponding to the destination operandof the floating point instruction is available as a source operand ofthe dependent instruction.

Rather than including comparator circuitry and other dependency checkinglogic for handling precision differences at each of the executionpipeline stages within execution pipelines 56A-56E, dispatch unit 16employs a scoreboard for tracking which floating point registers haveoutstanding updates. The scoreboard stores indications corresponding toeach storage location 40A-40N. Additionally, the scoreboard storesindications corresponding to upper portions 44A-44N and lower portions46A-46N of storage locations 42A-42N. In a first state, the indicationidentifies an outstanding update for the corresponding storage location.Conversely, the indication identifies no outstanding update for thecorresponding storage location in a second state. Selecting dependencyindications from the scoreboard which correspond to the source operandsof instructions being considered for dispatch may be performed morerapidly (i.e. in fewer levels of logic) than performing dependencychecks against each execution pipeline stage. The scoreboard dependencychecking circuitry may therefore be more suitable than conventionalcomparator-based checking for high frequency implementations ofmicroprocessor 10.

It is noted that, although FD pipeline 56E is described as a pipelinefor clarity, FP divide unit 34 may be non-pipelined according to oneembodiment. In this case, the pipeline stages shown in FD pipeline 56Erepresent clock cycles used to perform an operation. However, a newoperation may not be placed into the FD "pipeline" until the currentoperation is finished. The scoreboards within dispatch unit 16 handleeither pipelined, non-pipelined, or mixed implementations since thelatency of an operation is tracked in order to remove an indication ofdependency from the scoreboard (as opposed to the pipeline stage inwhich the operation is executing).

Turning next to FIG. 4, a block diagram of one embodiment of dispatchunit 16 is shown. As shown in FIG. 4, dispatch unit 16 includes aninstruction queue 80, a dependency checking unit 82, a dispatch controlunit 84, a scoreboard unit 86, and a latency control unit 88. Scoreboardunit 86 includes a plurality or preliminary scoreboards 90A-90D and amain scoreboard 92, as well as a dispatch valid storage 94 and updatelogic 96. Instruction queue 80 is coupled to receive instructions fromfetch unit 14. Additionally, instruction queue 80 is coupled to provideinstructions to floating point unit 26 and integer units 20A-20D underthe control of dispatch control unit 84. Instruction queue 80 is coupledto both dependency checking unit 82 and dispatch control unit 84.Dependency checking unit 82 is coupled to dispatch control unit 84 andto scoreboard unit 86. Dispatch control unit 84 is coupled topreliminary scoreboards 90A-90D, main scoreboard 92, and dispatch validstorage 94. Latency control unit 88 is coupled to update logic 96.

Dependency checking unit 82 uses preliminary scoreboards 90A-90D andmain scoreboard 92 to determine if dependencies exist for floating pointinstructions within instruction queue 80 which are being considered fordispatch. If a dependency is detected, dependency checking unit 82informs dispatch control unit 84. Among the dispatch rules employed bydispatch control unit 84 is the rule that a floating point instructionis not dispatched if a dependency upon a source operand of the floatingpoint instruction is detected by dependency checking unit 82.

Dependency checking unit 82 detects floating point instructions usingthe predecode data provided by fetch unit 14 for each instruction. FIG.4 illustrates an f/i field within each instruction entry in instructionqueue 80. The f/i field indicates the type of instruction (floatingpoint or integer). Additionally, the precision selected for theinstruction is indicated by the floating point precision field (shown asfp in instruction queue 80). Using the precision and the source registeraddresses, dependency checking unit 82 selects dependency indicationsfrom preliminary scoreboards 90A-90D and main scoreboard 92. Theselected dependency indications are then logically combined to determineif a dependency exists for the source operand.

Dispatch control unit 84 applies the dispatch rules to the instructionsin instruction queue 80 and causes as many instructions as possible tobe dispatched from instruction queue 80. For pipeline 50 shown above, upto four floating point instructions can be selected for dispatch in aparticular clock cycle: 2 floating point load/store instructions, afloating point add/subtract instruction, and a floating pointmultiply/divide instruction.

There may be many dispatch rules employed by dispatch control unit 84.Generally, the dispatch rules ensure that the instructions selected fordispatch can execute (along with the other instruction previouslyselected and concurrently selected for dispatch) using the hardwareprovided within microprocessor 10. Additional dispatch rules may beadded to simplify the hardware employed within microprocessor 10.Because the number of dispatch rules may be large (and hence the amountof logic employed to determine which instructions can be dispatched maybe large as well), it may be difficult to update main scoreboard 92 toreflect the destination operands of floating point instructions selectedfor dispatch during the clock cycle that the instructions aredispatched. Preliminary scoreboards 90A-90D are therefore employed.Dispatch control unit 84 forms preliminary scoreboards 90A-90D toreflect the destination operands of floating point instructionspreliminarily selected for dispatch to the corresponding pipelines. Nearthe end of the dispatch clock cycle, dispatch control unit 84 determineswhich of the preliminarily selected instructions are actually dispatched(i.e. dispatch control unit 84 completes evaluation of the dispatchrules). Dispatch control unit 84 stores, in dispatch valid storage 94, avalid indication corresponding to each floating point pipeline. Thevalid indication identifies whether or not the corresponding pipelinereceived a validly dispatched instruction. During the subsequent clockcycle, the valid indications are used by update logic 96 to qualify themerging of preliminary scoreboards 90A-90D into main scoreboard 92.Additionally, dependency checking unit 82 receives the qualifiedversions of preliminary scoreboards 90A-90D for performing dependencychecking during the subsequent clock cycle.

Main scoreboard 92 is thereby updated with indications of destinationoperands for each floating point instruction dispatched into pipeline50. The indications are removed (i.e. set to the state indicating nodependency) when the destination operand becomes available forsubsequent instructions (e.g. when the destination operand has beenstored into FP register file 28 or when the destination operand isavailable for forwarding). For the embodiment of FIG. 4, latency controlunit 88 performs the indication removal portion of updating mainscoreboard 92. Latency control unit 88 determines the latency (in clockcycles) for each dispatched floating point instruction and notes thedestination operand for that instruction. The latency depends on thetype of floating point instruction (and hence which pipeline theinstruction is dispatched to), and may depend upon the data operatedupon in response to the instruction. When the number of clock cyclesdetermined by latency control unit 88 expires, latency control unit 88signals update logic 96 to remove the corresponding indication from mainscoreboard 92. As an alternative to determining latencies using latencycontrol unit 88, execution pipelines 56A-56E may be configured to signaldispatch unit 16 when data corresponding to an instruction becomesavailable. A combination of signalling techniques and latencydetermination may be implemented as well. For example, data dependentlatencies may employ the signalling technique while static latencies mayuse the latency determination technique.

Turning next to FIG. 5, a diagram of one embodiment of a scoreboard 100is shown. Each of preliminary scoreboards 90A-90D and main scoreboard 92may be configured as shown in FIG. 5. Scoreboard 100 includes a firstportion 102 and a second portion 104. First portion 102 corresponds tostorage locations 40A-40N shown in FIG. 2, and second portion 104corresponds to storage locations 42A-42N shown in FIG. 2.

First portion 102 comprises a bit corresponding to each of storagelocations 40A-40N. The bit, when set, indicates that an update to thecorresponding storage location 40A-40N is outstanding within executionpipelines 56A-56E. When clear, the bit indicates that no update to thecorresponding storage location 40A-40N is outstanding.

Second portion 104 comprises a pair of bits corresponding to eachstorage location 42A-42N. One of the bits corresponds to the upperportion 44A-44N of the corresponding storage location 42A-42N, and theother bit corresponds to the lower portion 46A-46N of the correspondingstorage location 42A-42N. The bits are indicative, when set, of anupdate to the corresponding portion of the storage location and, whenclear, of no outstanding update to the corresponding portion of thecorresponding storage location.

FIG. 5 illustrates, at reference numeral 100a, a mapping of theindications stored in scoreboard 100 to the single precision floatingpoint registers within FP register file 28 for an embodiment employingSPARC™. First portion 102 does not include indications for singleprecision floating point registers, and therefore is not used. Secondportion 104 includes an indication for each single precision register.When a single precision register is used as a destination operand of afloating point instruction, the corresponding indication is setaccording to the mapping shown at reference numeral 100a. When a singleprecision register is used as a source operand, the correspondingindication is selected according to the mapping shown at referencenumeral 100a.

At reference numeral 100b, FIG. 5 illustrates a mapping of theindications stored in scoreboard 100 to the double precision floatingpoint registers within FP register file 28 for an embodiment employingSPARC™. First portion 102 includes an indication for the doubleprecision floating point registers which do not overlap with the singleprecision floating point registers. Second portion 104 includes anindication for each double precision register which overlaps with a pairof single precision registers. Each double precision indication withinsecond portion 104 comprises the indications corresponding to the singleprecision floating registers which overlap the corresponding doubleprecision register. If an update is outstanding for either of the singleprecision registers, a dependency is detected for the correspondingdouble precision registers. Similarly, an outstanding update to thecorresponding double precision register is indicated by setting bothindications. Access to either single precision register or to the doubleprecision register thereby detects a dependency.

The mappings shown at reference numerals 100a and 100b in FIG. 5 useeither "SP" (single precision) or "DP" (double precision) followed by aregister number as defined in the SPARC™ instruction set descriptions.The register address encoded into the instruction is the same as theregister number for single precision registers (i.e. a five bit binarynumber equal to the decimal numbers shown). The register address differsfrom the register number for double precision registers. If the registernumber as represented as a six bit binary number corresponding to theregister number, the least significant bit is discarded and the mostsignificant bit is stored into the least significant bits place to formthe five bit register address.

It is noted that, in many programs, single precision and doubleprecision updates to the same storage location 42A-42N are not mixed.However, single precision and double precision instructions may be mixedin the same program. The scoreboard organization shown in FIG. 5 adeptlyhandles the mixed precision and correctly detects dependencies for eachprecision. Furthermore, in cases in which single precision and doubleprecision updates to the same storage location 42A-42N are mixed, thescoreboard correctly detects dependencies between the two precisionsupdating the same location.

Turning next to FIG. 6, a block diagram illustrating a portion of oneembodiment of dependency checking unit 82 is shown connected toscoreboard 100. The portion illustrated in FIG. 6 provides dependencychecking for a source operand of a floating point instruction fromscoreboard 100. For pipeline 50, four such portions may be included fortwo source operands of each of the two floating point instructions (afloating point add/subtract type and a floating point multiply/dividetype). Additionally, portions may be included for each preliminaryscoreboard 90A-90D and main scoreboard 92. Alternatively, the contentsof preliminary scoreboards 90A-90D and main scoreboard 92 may be mergedprior to feeding dependency checking unit 82.

Dependency checking unit 82 forms a first set of indications 110corresponding to the single precision floating point registers. Firstset of indications 110 is selected from second portion 104 of scoreboard100. Additionally, dependency checking unit 82 forms a second set ofindications 112 corresponding to the double precision floating pointregisters. The second set of indications are selected from first portion102 of scoreboard 100 and the output of a logical combination block 114.Logical combination block 114 combines the indications stored in secondportion 104 of scoreboard 112 to form the corresponding double precisionindications. For the embodiment of FIG. 5, logical combination block 114comprises a plurality of OR gates 116A-116N. Each OR gate 116A-116Nforms a double precision indication from the corresponding singleprecision indications stored in second portion 104 of scoreboard 100.

Dependency checking unit 82 selects the dependency indication for thesource operand from first set of indications 110 and second set ofindications 112 based upon the source precision and the source registeraddress. By maintaining the indications as shown in scoreboard 100, thefirst and second set of indications may be rapidly generated (e.g., inone level of logic), and then a simple selection process performed.Advantageously, the dependency indication is rapidly generated without alarge amount of dependency checking logic.

Turning now to FIG. 7, an example of the use of preliminary scoreboards90A-90N and main scoreboard 92 for dependency checking is shown. Aninitial state of the scoreboards for a first clock cycle is shown atreference numeral 120, and a final state during a second clock cyclesucceeding the first clock cycle is shown at reference numeral 122.Changes between initial state 120 and final state 122 are illustrated bybold-faced numerals which are larger than the other numerals (except forthe clearing of preliminary scoreboards 90A-90N of the values from theinitial state).

Initial state 120 indicates preliminary dispatch of (i) a floating pointinstruction to LD0 pipeline 56A having a destination operand of floatingpoint register DP56; (ii) a floating point instruction to LD1 pipeline56B having a destination operand of floating point register DP28; (iii)and floating point instruction to FM pipeline 56D having a destinationoperand of floating point register SP25. Additionally, main scoreboard92 notes several outstanding updates of both single and double precisionregisters.

As noted next to arrow 124, the preliminary dispatch of instructions toLD0 pipeline 56A and FM pipeline 56D are determined to be valid.Therefore, the corresponding preliminary scoreboards 90A and 90D shownat reference numeral 120 are merged into main scoreboard 92. On theother hand, since the preliminary dispatch into LD1 pipeline 56B is notvalidated, the corresponding preliminary scoreboard 90B is not mergedinto main scoreboard 92. The bold-faced numerals within main scoreboard92 at reference numeral 122 reflect these updates.

As additionally noted next to arrow 124, instructions having floatingpoint registers SP0, DP28, SP2, and DP56 as destination operands areconsidered for dispatch during the second clock cycle to LD0 pipeline56A, LD1 pipeline 56B, FA pipeline 56C, and FM pipeline 56D,respectively. Since instructions considered for dispatch to LD0 pipeline56A and LD1 pipeline 56B are floating point load/store instructionshaving integer registers as source operands, no dependencies aredetected via the scoreboards shown. Therefore, these instructions areselected for dispatch as illustrated by the bold-faced numerals inpreliminary scoreboards 92A and 92B.

The floating point add instruction considered for dispatch to FApipeline 56C has source operands of SP29 and SP2 in the present example.Source operand SP29 would detect a dependency upon DP14 if theinstruction in LD1 pipeline 56B were indicated as validly dispatched.However, that instruction was not validly dispatched. The floating pointadd instruction is not precluded from dispatch due to having SP29 as asource operand. Source operand SP2 is dependent, however, upon aninstruction indicated as outstanding in main scoreboard 92. Therefore,the floating point add instruction is precluded from dispatching. Theindication corresponding to SP2 (the destination operand of the floatingpoint add instruction) is therefore reset as illustrated in FIG. 7.

A floating point multiply instruction is also considered for dispatch toFM pipeline 56D in the second clock cycle. The floating point multiplyinstruction has DP0 and DP56 as source operands. An examination of theDP0 portion of the scoreboards illustrated at reference numeral 120indicates no dependency for DP0. However, DP56 is dependent upon theinstruction preliminarily dispatched to LD0 pipeline 56A during thefirst clock cycle. Since this instruction was validly dispatched, adependency is detected. The floating point multiply instruction isthereby precluded from dispatching.

In accordance with the above disclosure, a dependency checking apparatusis shown which allows rapid detection of dependencies among registershaving different precisions. Because the apparatus operates rapidly, itmay be suitable for higher frequency microprocessors in which otherdependency apparatuses fail. Employing the present apparatus in amicroprocessor may allow the microprocessor to achieve a higherfrequency (i.e. shorter clock cycle), thereby allowing for increasedmicroprocessor performance.

Numerous variations and modifications will become apparent to thoseskilled in the art once the above disclosure is fully appreciated. It isintended that the following claims be interpreted to embrace all suchvariations and modifications.

What is claimed is:
 1. A method for dependency checking a source operandin a microprocessor, comprising:storing a plurality of indicationscorresponding to a first plurality of registers having a first precisionand to a second plurality of registers having a second precision in ascoreboard, wherein said plurality of indications identify which of saidfirst plurality of registers and said second plurality of registers areupdated by instructions within a pipeline of said microprocessor;forming a first subset of said plurality of indications corresponding tosaid first plurality of registers; forming a second subset of saidplurality of indications corresponding to said second plurality ofregisters; selecting a dependency indication from said first subset ofsaid plurality of indications if a source precision corresponding tosaid source operand is said first precision; and selecting saiddependency indication from said second subset of said plurality ofindications if said source precision is said second precision.
 2. Themethod as recited in claim 1 wherein said selecting a dependencyindication, from said first subset uses said source address to selectsaid dependency indication within said first subset.
 3. The method asrecited in claim 2 wherein said selecting said dependency indicationfrom said second subset uses said source address to select saiddependency indication within said second subset.
 4. The method asrecited in claim 1 wherein said microprocessor includes a register filein which a first storage location is shared by one of said firstplurality of registers and at least two of said second plurality ofregisters.
 5. The method as recited in claim 4 wherein at least two ofsaid plurality of indications corresponding to said at least two of saidsecond plurality of registers are indicative of said first storagelocation being updated by instructions within said pipeline.
 6. Themethod as recited in claim 5 wherein said forming said first subsetcomprises logically combining said at least two of said plurality ofindications.
 7. The method as recited in claim 6 wherein said logicallycombining comprising ORing.
 8. The method as recited in claim 5 furthercomprising indicating an update to said one of said first plurality orregisters by placing each of said at least two of said plurality ofindications corresponding to said at least two of said second pluralityof registers into a state indicating update.
 9. A method for dependencychecking a source operand in a microprocessor having a register filerepresenting a first set of registers having a first precision and asecond set of registers having a second precision, wherein a firststorage location within said register file is shared by one of saidfirst set of registers and at least two of said second set of registers,the method comprising:maintaining a scoreboard indicating which of aplurality of storage locations which comprise said register file areupdated by instructions within a pipeline of said microprocessor, saidscoreboard indicating which portions of said first storage location areupdated; forming a first set of indications corresponding to said firstset of registers from said scoreboard; forming a second set ofindications corresponding to said second set of registers from saidscoreboard; and selecting a dependency indication from said first andsecond sets of indications in response to a source register address anda source precision corresponding to said source operand.
 10. The methodas recited in claim 9 wherein said forming a first set compriseslogically combining a number of said second set of indicationscorresponding to said portions of said first storage location togenerate a first one of said first set of indications corresponding tosaid one of said first set of registers.
 11. The method as recited inclaim 9 wherein said maintaining comprises forming a preliminaryscoreboard corresponding to a first instruction being selected fordispatch during a first clock cycle.
 12. The method as recited in claim11 wherein said maintaining further comprises merging said preliminaryscoreboard with said scoreboard during a second clock cycle if saidfirst instruction is validly dispatched during said first clock cycle.13. The method as recited in claim 12 wherein said maintaining furthercomprises inhibiting said merging if said first instruction is notvalidly dispatched.
 14. The method as recited in claim 11 wherein saidmaintaining further comprises removing an indication within saidscoreboard upon data corresponding to a second instruction which causessaid indication to be placed into said scoreboard becoming available.15. The method as recited in claim 14 wherein said data becomingavailable comprises forwarding of said data.
 16. The method as recitedin claim 14 wherein said data becoming available comprises said databeing stored into said register file.